Metadata editing apparatus, metadata reproduction apparatus, metadata delivery apparatus, metadata search apparatus, metadata re-generation condition setting apparatus, metadata delivery method and hint information description method

ABSTRACT

Multimedia content containing moving pictures and audio is divided into multiple scenes and metadata is generated for each of the scenes obtained as a result of the division. It is possible to generate metadata containing scene structure information metadata that describes the hierarchical structure of the content in addition to scene section information and titles. Also, a name or an identifier of each descriptor contained in the metadata is described as hint information for manipulation of metadata composed of at least one descriptor describing semantic content, a structure, and characteristics of content.

This application is a Divisional of co-pending application Ser. No.10/510,548, filed on Oct. 8, 2004, and for which priority is claimedunder 35 U.S.C. § 120. Application Ser. No. 10/510,548 is the nationalphase of PCT International Application No. PCT/JP03/03450, filed on Mar.20, 2003 under 35 U.S.C. § 371, which claims priority from JapaneseApplication Nos. 2002-110259 filed on Apr. 12, 2002, and 2002-178169filed on Jun. 19, 2002. The entire contents of each of theabove-identified applications are hereby incorporated by reference.

TECHNICAL FIELD

The present invention relates to a metadata editing apparatus, ametadata reproduction apparatus, a metadata delivery apparatus, ametadata search apparatus, a metadata re-generation condition settingapparatus, a content delivery apparatus, and a metadata delivery method,with which, for instance, multimedia content containing moving picturesand audio is divided into multiple scenes and metadata is generated foreach of the scenes obtained as a result of the division.

BACKGROUND ART

A conventional moving picture management apparatus is known whichincludes: a means for dividing a video into multiple scenes and editingand creating at least one index as an assembly of section informationnecessary for reproduction of each scene, a scene number assigned toeach scene, and a representative image of each scene; means for giving atitle to each index for the purpose of searching; and means forsearching for a desired index using a corresponding title andsuccessively reproducing scenes of the index in order of the scenenumbers. With this construction, it becomes possible to reproduce onlynecessary scenes by editing an index in which the necessary scenes arearranged (see Japanese Patent Laid-Open No. 2001-028722 (page 1, FIG.1), for instance).

With the moving picture management apparatus described above, however,metadata is merely created using the section information necessary forthe scene reproduction, the scene number, and the scene representativeimage. Therefore, there remains a problem that it is impossible to alsomanage the structure of video data such as the hierarchical property ofthe video data.

Also, at the time of searching for a registered image, the title givento a corresponding index is used, causing a disadvantage in that inorder to obtain an appropriate search result, an appropriate title needsto be input.

The present invention has been made in order to solve the problemsdescribed above. Therefore, it is an object of the present invention toprovide a metadata editing apparatus capable of generating metadata thatis index information showing the structure and the like of content(video data, for instance) in addition to scene section information andtitles.

It is another object of the present invention to provide a metadatareproduction apparatus, a metadata delivery apparatus, a metadata searchapparatus, a metadata re-generation condition setting apparatus, acontent delivery apparatus, and a metadata delivery method, with whichit is possible to collect and reproduce only scenes which a user wishesto watch using the metadata generated by the metadata editing apparatus,or to search for the scenes desired by the user using characteristicamounts or the like described in the metadata.

DISCLOSURE OF INVENTION

A metadata editing apparatus according to the present invention isprovided with: a scene division unit for dividing multimedia contentcontaining at least one of moving pictures and audio into a plurality ofscenes to generate scene section information metadata indicating a scenestart position and a scene end position for each scene obtained as aresult of the division; a scene description edit unit for performinghierarchical editing of each scene of the multimedia content based onthe scene section information metadata sent from the scene division unitand generating scene structure information metadata describing ahierarchical structure of the multimedia content; and a metadatadescription unit for integrating the scene section information metadataand the scene structure information metadata and generating metadatadescribing contents and a structure of the multimedia content inaccordance with a predetermined format.

Further a metadata delivery apparatus according to the present inventionis provided with: a hint information analysis unit for analyzingmetadata optimization hint information describing a type and content ofeach descriptor contained in metadata; a metadata analysis/re-generationunit for analyzing metadata describing contents and a structure ofmultimedia content containing at least one of moving pictures and audiobased on the analyzed metadata optimization hint information and acondition for metadata re-generation and re-generating second metadata;and a metadata delivery unit for delivering the second metadatare-generated by the metadata analysis/re-generation unit to a clientterminal.

Further a metadata delivery method according to the present inventionincludes the steps of: analyzing metadata optimization hint informationdescribing a type of each descriptor contained in metadata;re-generating second metadata by analyzing the metadata describingcontents and a structure of multimedia content containing at least oneof moving pictures and audio based on the analyzed metadata optimizationhint information and a condition for re-generation of the metadata; anddelivering the re-generated second metadata to a client terminal.

Further a hint information description method according to the presentinvention includes the steps of: describing, as hint information formanipulation of metadata composed of at least one descriptor describingsemantic content, a structure, and characteristics of content, a name oran identifier of each descriptor contained in the metadata.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing a construction of a metadata editingapparatus according to a first embodiment of the present invention.

FIG. 2 shows a news video that is an example of a target of editing bythe metadata editing apparatus according to the first embodiment of thepresent invention.

FIG. 3 shows an example of scene section information metadata of a scenedivision unit of the metadata editing apparatus according to the firstembodiment of the present invention.

FIG. 4 shows an example of scene structure information metadata of ascene description edit unit of the metadata editing apparatus accordingto the first embodiment of the present invention.

FIG. 5 shows examples of screen images of a content reproduction/displayunit and a user input unit of the metadata editing apparatus accordingto the first embodiment of the present invention.

FIG. 6 is a block diagram showing a construction of a metadata editingapparatus according to a second embodiment of the present invention.

FIG. 7 illustrates how the metadata editing apparatus according to thesecond embodiment of the present invention operates.

FIG. 8 is a block diagram showing a construction of a metadatareproduction apparatus according to a third embodiment of the presentinvention.

FIG. 9 illustrates how the metadata reproduction apparatus according tothe third embodiment of the present invention operates.

FIG. 10 is a block diagram showing a construction of a content deliverysystem according to a fourth embodiment of the present invention.

FIG. 11 shows content (in this case, a news video) structure informationoutputted from a metadata analysis unit of a metadata delivery serveraccording to the fourth embodiment of the present invention.

FIG. 12 shows an example of a structure of content after restructuringby a metadata re-generation unit of the content delivery systemaccording to the fourth embodiment of the present invention.

FIG. 13 is a block diagram showing a construction of a metadata deliveryserver according to a fifth embodiment of the present invention.

FIG. 14 shows an example of video content, with reference to whichprocessing of metadata optimization hint information by the metadatadelivery server according to the fifth embodiment of the presentinvention is described.

FIG. 15 shows how metadata is described in MPEG-7 by the metadatadelivery server according to the fifth embodiment of the presentinvention.

FIG. 16 shows an example of a format of the metadata optimization hintinformation used by the metadata delivery server according to the fifthembodiment of the present invention.

FIG. 17 shows the metadata optimization hint information used by themetadata delivery server according to the fifth embodiment of thepresent invention.

FIG. 18 is a flowchart showing how a metadata analysis/re-generationunit of the metadata delivery server according to the fifth embodimentof the present invention operates.

FIG. 19 is another flowchart showing how the metadataanalysis/re-generation unit of the metadata delivery server according tothe fifth embodiment of the present invention operates.

FIG. 20 is a block diagram showing a construction of a metadata searchserver according to a sixth embodiment of the present invention.

FIG. 21 is a flowchart showing how a metadata analysis unit of themetadata search server according to the sixth embodiment of the presentinvention operates.

FIG. 22 is a block diagram showing a construction of a client terminalaccording to a seventh embodiment of the present invention.

FIG. 23 is a block diagram showing a construction of a content deliveryserver according to an eighth embodiment of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

Embodiments of the present invention will now be described withreference to the accompanying drawings, with a first embodiment relatingto a metadata editing apparatus, a second embodiment relating to anothermetadata editing apparatus, a third embodiment relating to a metadatareproduction apparatus, a fourth embodiment relating to a contentdelivery system, a fifth embodiment relating to a metadata deliveryserver, a sixth embodiment relating to a metadata search server, aseventh embodiment relating to a client terminal, and an eighthembodiment relating to a content delivery server.

FIRST EMBODIMENT

In this first embodiment, a metadata editing apparatus will be describedwhich divides multimedia content containing moving pictures and audiointo multiple scenes and creates metadata (index information) containingdescriptions of a scene hierarchical structure and characteristicamounts of each scene.

The metadata editing apparatus according to the first embodiment of thepresent invention will be described with reference to the accompanyingdrawings. FIG. 1 is a block diagram showing a construction of themetadata editing apparatus according to the first embodiment of thepresent invention. Note that in each drawing, the same referencenumerals denote the same or equivalent portions.

Referring to FIG. 1, a metadata editing apparatus 100 includes a contentreproduction/display unit 2, a scene division unit 3, a thumbnail imagegeneration unit 4, a scene description edit unit 5, a text informationgiving unit 6, a characteristic extraction unit 7, a user input unit 8,and a metadata description unit 9.

The content reproduction/display unit 2 reproduces and displaysmultimedia content 10 that includes video data and audio data and is atarget of editing. The scene division unit 3 divides the content intomultiple scenes. The thumbnail image generation unit 4 extracts arepresentative frame of each scene as a thumbnail image. The scenedescription edit unit 5 hierarchically edits the scenes obtained as aresult of the division by the scene division unit 3 through scenegrouping, scene combining, scene deletion, generation of informationthat shows relations among the scenes, and the like. The textinformation giving unit 6 gives various types of text information toeach scene. The characteristic extraction unit 7 extractscharacteristics of each scene.

Also, the user input unit 8 receives input of designation informationfrom a user and outputs it to the content reproduction/display unit 2,the scene division unit 3, the thumbnail image generation unit 4, thescene description edit unit 5, and the text information giving unit 6 asuser input information 11.

Further, the metadata description unit 9 integrates scene sectioninformation metadata 12, scene thumbnail image information metadata 13,scene structure information metadata 14, text information metadata 15,and characteristic description metadata 16 outputted from the scenedivision unit 3, the thumbnail image generation unit 4, the scenedescription edit unit 5, the text information giving unit 6, and thecharacteristic extraction unit 7, respectively. The metadata descriptionunit 9 then generates metadata 17 describing the contents and structureof the multimedia content in accordance with a specified format.

Next, how the metadata editing apparatus according to the firstembodiment operates will be described with reference to the accompanyingdrawings. FIG. 2 shows a construction of a news video that is an exampleof a target of editing by the metadata editing apparatus according tothe first embodiment.

A case where the news video having the construction shown in FIG. 2 isedited will be described as an example.

First, the content reproduction/display unit 2 of the metadata editingapparatus 100 receives input of the multimedia content 10, such as videocontent, stored in a content storage unit (not shown) via a network orthe like, and reproduces/displays the multimedia content 10 for editing.

When the user of the metadata editing apparatus 100 inputs positions forclipping a scene, which is to say a scene start position and a scene endposition, using the user input unit 8 while watching the reproducedvideo, the scene division unit 3 generates the scene section informationmetadata 12 showing the scene start position and the scene end positioninputted from the user.

FIG. 3 shows an example of the scene section information metadatagenerated by the scene division unit of the metadata editing apparatusaccording to the first embodiment.

Here, the scene section information metadata 12 shown in FIG. 3 wasgenerated from the news video shown in FIG. 2. As shown in FIG. 3, thescene section information metadata 12 generated by the scene divisionunit 3 gives the scene start position and the scene end position of eachscene clipped from the news video content, such as a “news digest”scene, a “domestic news” scene, and an “international news” scene.

On receiving designation of scene editing from the user via the userinput unit 8, the scene description edit unit 5 performs hierarchicalediting of the scenes continuously clipped by the scene division unit 3based on the scene section information metadata 12 from the scenedivision unit 3, and then outputs the scene structure informationmetadata 14. Here, the scene hierarchical editing refers to scenegrouping, scene re-division, scene combining, or scene deletion, forinstance. The scene grouping refers to grouping of scenes that arerelated to each other with respect to specific characteristics into asingle group. For instance, as shown in FIG. 4, the “domestic news”scene, the “international news” scene, and the “financial news” scene ofthe news video shown in FIG. 2 are grouped into a single “news” group.Also, the scene re-division refers to division of a single scene intomultiple scenes and the scene combining refers to generation of a singlescene by combining multiple scenes with each other.

FIG. 4 shows an example of the scene structure information metadatagenerated by the scene description edit unit of the metadata editingapparatus according to the first embodiment.

The scene structure information metadata 14 shown in FIG. 4 describesthe hierarchical structure of the video content generated as a result ofthe editing by the scene description edit unit 5. In FIG. 4, a “news”scene is edited into multiple scenes, such as a “news digest” scene, a“news” scene, a “special” scene, and a “sports” scene, and the “news”scene is further hierarchically edited into a “domestic news” scene, an“international news” scene, and an “financial news” scene by the scenedescription edit unit 5 through scene editing such as the scenegrouping, scene re-division, and scene combining.

Then, the metadata 14 generated by the scene description edit unit 5,such as the metadata shown in FIG. 4, is outputted to the metadatadescription unit 9.

On the other hand, the thumbnail image generation unit 4 generates arepresentative frame of each scene clipped by the scene division unit 3as a thumbnail image based on the scene section information metadata 12from the scene division unit 3, and outputs information concerning thegenerated thumbnail image as the thumbnail image information metadata 13to the metadata description unit 9, in which the thumbnail imageinformation metadata 13 is registered. Here, it is possible for the userto perform selection of the thumbnail using the user input unit 8,although it is also possible to automatically set a head frame or eachframe clipped at fixed time intervals as the representative frame or toautomatically detect each scene change point and set a frame at eachdetected point as the representative frame. The thumbnail imageinformation metadata 13 is information showing the position (such as theframe number or time) of the thumbnail in the video content orinformation giving the location (such as the URL) of the thumbnailimage.

Also, the characteristic extraction unit 7 extracts visualcharacteristic amounts possessed by each scene, such as motions, colors,or shapes of objects contained in the scene, from the scene based on thescene section information metadata 12 from the scene division unit 3.The extracted characteristic amounts are outputted to the metadatadescription unit 9 as the characteristic description metadata 16 and areregistered therein.

Also, the text information giving unit 6 gives various types of textinformation, such as a title, an abstract, a keyword, a comment, andscene importance, designated by the user to each scene based on thescene section information metadata 12 from the scene division unit 3.Here, the text information may be given through user's input using theuser input unit 8 or be automatically given through analysis of audioinformation and captions contained in the content. The text informationis outputted to the metadata description unit 9 and is registeredtherein as the text information metadata 15.

FIG. 5 shows examples of screen images displayed by the contentreproduction/display unit and the user input unit 8 of the metadataediting apparatus according to the first embodiment. In FIG. 5, a videoreproduction screen G1 is an example of the screen image displayed bythe content reproduction/display unit 2, with content to be edited beingreproduced/displayed on this video reproduction screen G1. Although notclearly shown in FIG. 5, like in the case of an ordinary videoreproduction apparatus, a user interface is also provided which includesbuttons and the like for commanding “reproduction”, “stop”, “rewind”,“fast forward”, “frame advance”, and other operations. Also, below thevideo reproduction screen G1, a scene division designation screen G2 isdisplayed which has a slider form, for instance. The user designates ascene start position and a scene end position of the video displayed onthe video reproduction screen G1 through this scene division designationscreen G2 while watching the video displayed on the video reproductionscreen G1. Also, the user simultaneously designates the position of athumbnail between the scene start position and the scene end positionthrough the scene division designation screen G2. Here, when thethumbnail position is designated through the scene division designationscreen G2, the thumbnail image generation unit 4 generates a thumbnailimage from a frame of the video content at the designated position.

Also, the thumbnail image, whose position has been designated throughthe scene division designation screen G2, is displayed on a scenedivision information display screen G3 as scene division information.Here, on this scene division information display screen G3, it is alsopossible to display information showing the scene start position and thescene end position in addition to the thumbnail image, as shown in FIG.3.

Next, the user designates scene editing through a tree structuregeneration designation/display screen G4. That is, the user generates atree showing the hierarchical structure possessed by the video contentwhile watching the scene division information, such as the thumbnailimage, displayed on the scene division information display screen G3.

When performing the scene grouping, the user uses a manipulation methodwith which, for instance, a new node is added to the tree and each scenethat should be grouped is added to the node. In order to perform thescene addition, the user may use a method with which a scene that shouldbe added is selected on the scene division information display screen G3and the selected scene is added to the node by a drag-and-dropoperation. Here, it is possible for the user to input text informationfor the selected scene from the scene division information displayscreen G3 or the tree structure generation designation/display screen G4using the user input unit 8 provided as a user interface for giving thetext information to the scene via the text information giving unit 6.

The metadata description unit 9 generates a metadata file described inaccordance with a specified description format by integrating thevarious types of metadata outputted from the scene division unit 3, thethumbnail image generation unit 4, the scene description edit unit 5,the text information giving unit 6, and the characteristic extractionunit 7. The specified metadata description format may be a uniquelydetermined format, although MPEG-7 standardized by ISO is used in thisfirst embodiment. The MPEG-7 stipulates a format for describing thestructure and characteristics of content and includes an XML file formatand a binary format.

As described above, the metadata editing apparatus 100 of the firstembodiment is provided with the scene description edit unit 5 forhierarchically editing scenes and the characteristic extraction unit 7for extracting characteristics from the scenes, so that it becomespossible to generate metadata describing the hierarchical structurepossessed by content, such as video data, and characteristic amounts ofeach scene.

It should be noted here that the multimedia content 10 inputted into thecontent reproduction/display unit 2 is obtained from a content server(not shown) existing on a network, from a content storage unit (notshown) in the metadata editing apparatus 100, or from an accumulationmedium (not shown) such as a CD or a DVD, for instance. In a likemanner, the metadata outputted from the metadata description unit 9 isaccumulated in a metadata server (not shown) existing on a network, in ametadata accumulation unit (not shown) in the metadata editingapparatus, or in an accumulation medium (not shown), such as a CD or aDVD, together with content, for instance.

Also, in the first embodiment, a case where the metadata editingapparatus 100 is provided with both of the scene description edit unit 5and the characteristic extraction unit 7 has been described. However,the present invention is not limited to this and it is of coursepossible to provide the metadata editing apparatus 100 with only one ofthe scene description edit unit 5 and the characteristic extraction unit7.

EMBODIMENT

In the first embodiment described above, every scene is dividedmanually. However, a metadata editing apparatus to be described in thissecond embodiment is provided with a scene change detection unit forautomatically detecting each scene change point.

The metadata editing apparatus according to the second embodiment of thepresent invention will be described with reference to the accompanyingdrawings. FIG. 6 is a block diagram showing a construction of themetadata editing apparatus according to the second embodiment of thepresent invention.

Referring to FIG. 6, a metadata editing apparatus 100A includes acontent reproduction/display unit 2, a scene division unit 3, athumbnail image generation unit 4, a scene description edit unit 5, atext information giving unit 6, a characteristic extraction unit 7, auser input unit 8, a metadata description unit 9, and scene changedetection unit 39. Note that reference numeral 40 denotes scene startposition information which is automatically detected.

Next, how the metadata editing apparatus according to the secondembodiment operates will be described with reference to the accompanyingdrawings.

FIG. 7 illustrates how the metadata editing apparatus according to thesecond embodiment of the present invention operates.

The construction elements other than the scene change detection unit 39and the scene division unit 3 operate in the same manner as in the firstembodiment described above. Therefore, operations unique to the secondembodiment will be described below.

The scene change detection unit 39 automatically detects each scenechange/cut point. This scene change detection is performed based on adifference in pixel between frames, a difference in color between theframes, a difference in luminance histogram between the frames, or thelike, for instance. The scene division unit 3 determines a scene startposition and a scene end position based on each scene change pointdetected by the scene change detection unit 39.

Hereinafter, processing by the scene change detection unit 39 and thescene division unit 3 will be described in detail by taking, as anexample, a case where a news video is content that is a target ofediting.

A case where a color histogram is used as characteristic amounts for thescene change detection will be described as an example.

The scene change detection unit 39 calculates a color histogram for eachframe. As a color system, HSV, RGB, YCbCr, and the like are available,although an HSV color space is used in this example. This HSV colorspace is composed of three elements called “hue (H)”, “saturation (S)”,and “value (V)”. A histogram of each element is calculated. Next, fromthe obtained histogram, a difference in histogram between frames iscalculated based on Equation 1 given below, for instance. Here, it isassumed that frames from a scene start frame to the Nth frame (N=3, forinstance) belong to the same scene, that is, do not contain any scenechange point. Note that as the initial characteristic amounts of thescene, a means value (mean) and a standard deviation (sd) of thedifferences in histogram between the first N frames are obtained basedon Equation 2 given below. $\begin{matrix}{{sum}_{i} = {{\overset{bin\_ H}{\sum\limits_{k = 1}}{{{H_{i}(k)} - {H_{i - 1}(k)}}}} + {\overset{bin\_ S}{\sum\limits_{k = 1}}{{{S_{i}(k)} - {S_{i - 1}(k)}}}} + {\sum\limits_{k = 1}^{bin\_ V}{{{V_{i}(k)} - {V_{i - 1}(k)}}}}}} & {{Equation}\quad 1}\end{matrix}$sum_(i): a sum of differences in histogram between a frame i and a framei-1H_(i)(h): a hue histogram, bin_H: the number of elements of thehistogramS_(i)(h): a saturation histogram, bin_S: the number of elements of thehistogramV_(i)(h): a value histogram, bin_V: the number of elements of thehistogram $\begin{matrix}{{\text{mean} = {\frac{1}{N - 1}{\sum\limits_{i = 1}^{n - 1}{sum}_{i}}}},{{sd} = \sqrt{\frac{1}{N - 1}{\sum\limits_{i = 1}^{N - 1}\left( {{sum}_{i} - \text{mean}} \right)^{2}}}}} & {{Equation}\quad 2}\end{matrix}$mean: a means value of the differences in histogram between the framessd: a standard deviation of the differences in histogram between theframes

Then, each frame, from the N+1th and the subsequent frames, that has aninter-frame difference in histogram greater than “means +λ·sd” isregarded as a scene change point and is set as a new scene startposition candidate.

In this manner, multiple scene start position candidates are obtained.Next, there will be considered a case where like in the case of a newsvideo, an image having a determined pattern is inserted at switchingbetween news or the like.

In many cases, in a news video, an image having a determined pattern,such as an image composed of an announcer, a studio set on thebackground, and a character description (caption), is inserted atswitching between news, for instance. Accordingly, the image having thepredetermined pattern (hereinafter referred to as the “template image”)or metadata describing the characteristic amounts of the temperate imageis registered in advance. For instance, the characteristic amounts ofthe template image are the color histogram of the temperate image, themotion pattern (for instance, less motions are observed in the area ofan announcer at switching between news), or the like.

When the temperate image is registered in advance, each imagecorresponding to a scene change point is matched against the templateimage, as shown in FIG. 7. Then, if the similarity therebetween is high,the scene change point is registered as a scene start position. Thesimilarity may be judged based on inter-frame differences, inter-framedifferences in color histogram, or the like.

Also, when the characteristic amounts of the template image areregistered in advance, characteristic amounts of each imagecorresponding to a scene change point are extracted and are matchedagainst the characteristic amounts of the temperate image. If thesimilarity therebetween is high, the scene change point is registered asa scene start position. Then, information showing the scene startposition is outputted to the scene division unit 3.

The scene division unit 3 determines a scene start position and a sceneend position based on the information showing the scene start positionautomatically detected by the scene change detection unit 39. Note thatthe scene division unit 3 of the second embodiment is also capable ofdetermining the scene start position and the scene end position based ondesignation from the user, like in the first embodiment described above.

It is also possible for the scene change detection unit 39 to detecteach scene change point contained in a scene with reference to eachscene start position and each scene end portion described in the scenesection information metadata 12 outputted from the scene division unit 3to the scene change detection unit 39.

The scene description edit unit 5 re-divides or integrates the scenesautomatically detected by the scene change detection unit 39 based onthe scene section information metadata 12 from the scene division unit3. Note that the details of the scene description edit unit 5 are thesame as those in the first embodiment described above.

As described above, with the metadata editing apparatus 100A accordingto the second embodiment, it becomes possible to generate metadatadescribing the hierarchical structure possessed by content, such asvideo data, and characteristic amounts of each scene, like in the firstembodiment described above. In addition, the scene change detection unit39 is provided, so that it becomes possible to automatically detect eachscene change point in content.

THIRD EMBODIMENT

In this third embodiment, a metadata reproduction apparatus will bedescribed which performs summary reproduction of images, searching, andthe like using the metadata generated by the metadata editing apparatusaccording to the first embodiment or the second embodiment describedabove.

The metadata reproduction apparatus according to the third embodiment ofthe present invention will be described with reference to theaccompanying drawings. FIG. 8 is a block diagram showing a constructionof the metadata reproduction apparatus according to the third embodimentof the present invention.

Referring to FIG. 8, a metadata reproduction apparatus 200 includes ametadata analysis unit 19, a structure display unit 20, a thumbnailimage display unit 21, an user input unit 22, a search unit 23, a searchresult display unit 24, a summary creation unit 25, a summary structuredisplay unit 26, and a content reproduction unit 27.

The metadata analysis unit 19 performs analysis of metadata 28describing the hierarchical scene structure possessed by content,information concerning the thumbnail of each scene, the characteristicamounts of each scene, and the like. The structure display unit 20displays a scene structure 29 obtained as a result of the metadataanalysis, that is, the hierarchical structure of the content. Thethumbnail image display unit 21 displays thumbnail image information 30obtained as a result of the metadata analysis.

With the user input unit 22, a user inputs search designation,reproduction designation, and the like. The search unit 23 performssearching based on the search designation (search condition 31) from theuser and the scene characteristic amounts or text information 32obtained from the metadata. The search result display unit 24 displays aresult 33 of the searching. The summary creation unit 25 performscreation of a summary based on summary creation designation (summarycreation condition 34) from the user. The summary structure display unit26 displays a structure 38 of summarized content. The contentreproduction unit 27 reproduces/displays the content based on summaryinformation 35, content reproduction designation 36, and content 37 tobe reproduced.

Next, how the metadata reproduction apparatus according to the thirdembodiment operates will be described with reference to the accompanyingdrawings.

First, the metadata analysis unit 19 receives input of the metadata 28describing the hierarchical scene structure possessed by the content,information concerning the thumbnail of each scene, the characteristicamounts of each scene, and the like, and performs analysis of themetadata.

In the third embodiment, it is assumed that the metadata 28 is metadatagenerated by the metadata description unit 9 of the first embodiment orthe second embodiment described above in a format stipulated by MPEG-7.Consequently, the metadata is a text file written in XML or a binaryfile encoded in the binary format.

If the metadata 28 is written in XML, the metadata analysis unit 19serves as an XML parser that performs analysis of an XML file. On theother hand, if the metadata 28 is encoded in the binary format, themetadata analysis unit 19 serves as a decoder that performs decoding ofthe metadata 28.

The structure display unit 20 receives input of a result of the analysisby the metadata analysis unit 19 and displays the hierarchical scenestructure 29 of the content. The scene structure of the content isdisplayed in a tree form together with the title of each scene, as shownin FIG. 4.

The thumbnail image display unit 21 receives input of the result of theanalysis by the metadata analysis unit 19 (thumbnail image information30) and displays a list of thumbnail images of the content.

The search unit 23 receives search designation from the user via theuser input unit 22 and searches for a scene contained in the content. Atthis time, the user inputs a search condition by giving a keyword, asample image, or the like via the user input unit 22. The search unit 23searches for each scene matching the search condition 31, such as thekeyword or the characteristics of the sample image, given by the userbased on the scene characteristic amounts described in the metadata orthe text information 32 giving scene titles and the like.

When the searching by the search unit 23 is finished, the search resultdisplay unit 24 receives input of the result 33 of the searching by thesearch unit 23 and performs displaying of the search result. As a methodfor displaying the search result, the thumbnail image of each scenematching the search condition is displayed, for instance.

Also, the summary creation unit 25 creates a summary of the contentbased on summary creation designation from the user via the user inputunit 22. At this time, the user inputs information showing thereproduction time of summarized content, user preferences, and the likeusing the user input unit 22. When the content is a news video, forinstance, the user inputs preference information showing that, forinstance, he/she wishes to mainly watch sports news in the news video orto watch a 20-minute summary of the news video whose original length isone hour. The summary creation unit 25 also creates the summaryinformation 35 matching the summary condition based on the scenereproduction times described in the metadata and the text information 32giving the scene titles and the like. For instance, this summaryinformation 35 is a reproduction list of scenes contained in thesummarized content and is a list in which the location information, suchas the URL, of the content is written together with the start positionand end position of each scene in the content that the user wishes toreproduce.

Also, the content reproduction/display unit 27 specifies target contentbased on the location information of the content contained in thesummary information 35, and performs obtainment/reproduction/display ofeach scene to be reproduced based on the scene list contained in thesummary information. In another form, the summary informationhierarchically describes the scene structure of the summarized content.

FIG. 9 shows an example of a hierarchical scene structure. FIG. 9(a)shows an example of a scene structure of original content. Each scene isgiven importance in a range of 0.0 to 1.0, with “1.0” meaning thehighest importance and “0.0” meaning the lowest importance. Theimportance is calculated based on the user preferences, for instance. Ifthe user preferences are registered in advance and indicate that he/shewishes to watch scenes of a soccer game of a team A and, in particular,to necessarily watch a result of the game and goal scenes, each scene isgiven importance reflecting the user preferences.

Following this, when summarization is performed using only scenes havingthe highest importance in FIG. 9(a), there is generated summarizedcontent having the scene structure shown in FIG. 9(b). Note that eachscene has metadata showing the location information, such as the URL, ofthe content containing the scene, the position information (the startposition and the end position) of the scene in the content, and thelike. Information concerning the scene structure 38 of the summarizedcontent is passed to the summary structure display unit 26, which thendisplays the scene structure 38 in the tree form shown in FIG. 9(b).

Also, when the user selects at least one scene that he/she wishes toreproduce using the scene structure displayed by the structure displayunit 20 or the summary structure display unit 26 or using the scenethumbnails displayed by the thumbnail image display unit 21 or thesearch result display unit 24 via the user input unit 22, the contentreproduction/display unit 27 reproduces/displays each selected scenecontained in the content.

As described above, with the metadata reproduction apparatus 200according to the third embodiment, it becomes possible to reproduce onlyeach scene that the user wishes to watch using the metadata generated bythe metadata editing apparatus according to the first embodiment or thesecond embodiment described above or to search for the scene desired bythe user using the characteristic amounts described in the metadata.

In the third embodiment, the content reproduction/display unit 27 isprovided within the metadata reproduction apparatus 200. However, thiscontent reproduction/display unit may be provided in another apparatus.For instance, manipulations and displaying concerning reproduction ofthe metadata, such as displaying of the scene structure and thethumbnail images, may be performed by a mobile telephone, a portableinformation terminal, or the like, and processing and displayingconcerning reproduction of the multimedia content may be performed by aterminal (PC, for instance) connected to the mobile telephone, theportable information terminal, or the like via a network.

FOURTH EMBODIMENT

In this fourth embodiment, a metadata delivery server (metadata deliveryapparatus), which delivers the metadata of content to a client terminal,and a content delivery server, which scalably constructs the contentwith reference to the terminal capability of the client terminal anddelivers the constructed content to the client terminal, will bedescribed.

A content delivery system according to the fourth embodiment of thepresent invention will be described with reference to the accompanyingdrawings. FIG. 10 is a block diagram showing a construction of thecontent delivery system according to the fourth embodiment of thepresent invention.

Referring to FIG. 10, a content delivery server 300 includes a metadatadelivery server 400, various client terminals 481 to 48 n, and a contentdelivery server 50.

The metadata delivery server 400 includes a metadata accumulation unit41, a metadata analysis unit 42, a terminal capability judgment unit 43,a metadata re-generation unit 44, and a metadata delivery unit 45.

In the metadata accumulation unit 41, there is accumulated the metadatagenerated by the metadata editing apparatus of the first embodiment orthe second embodiment described above, for instance. The metadataanalysis unit 42 performs analysis of metadata 49 describing thestructure and characteristics of content. The terminal capabilityjudgment unit 43 judges the terminal capability of each client terminalbased on information 51 concerning the capability of the clientterminal. The metadata re-generation unit 44 restructures the content inaccordance with the judged terminal capability of the client terminalbased on a result 50 of the analysis of the metadata, and re-generatesmetadata 52 of the restructured content. The metadata delivery unit 45delivers metadata 53 re-generated by the metadata re-generation unit 44to the client terminals 481 to 48 n.

Note that the metadata accumulation unit 41 may be provided outside themetadata delivery server 400 of the fourth embodiment. In this case, themetadata delivery server 400 receives input of the metadata 49 from themetadata accumulation unit 41 via a network (not shown) or the like.

On the other hand, the content delivery server 500 includes a contentaccumulation unit 46 and a content delivery unit 47.

In the content accumulation unit 46, there is accumulated content 55.The content delivery unit 47 delivers content 56 to the client terminals481 to 48 n in accordance with content delivery requests 54 from theclient terminals.

Like in the case of the metadata delivery server 400 described above,the content accumulation unit 46 may be provided outside the contentdelivery server 500. In this case, the content delivery server 500receives input of the content data 55 via a network (not shown).

Next, how the content delivery system according to the fourth embodimentoperates will be described with reference to the accompanying drawings.

First, on the metadata delivery server 400 side, the metadata analysisunit 42 performs analysis of the metadata accumulated in the metadataaccumulation unit 41. The metadata analysis unit 42 operates in the samemanner as the metadata analysis unit 19 of the metadata reproductionapparatus 200 of the third embodiment described above. By performing theanalysis of the metadata, the metadata analysis unit 42 obtainsinformation concerning the structure and characteristics of the content.

FIG. 11 shows content structure information outputted from the metadataanalysis unit of the metadata delivery server according to the fourthembodiment, with the illustrated example relating to a news video. InFIG. 11, the hierarchical scene structure of the content is displayed ina tree form. Each node of the tree corresponds to one scene and isassociated with various types of scene information. Here, the varioustypes of scene information include a scene title, an abstract, timeinformation giving a scene start position and a scene end position, ascene thumbnail, a representative frame, a thumbnail shot, arepresentative shot, and scene characteristics such as visualcharacteristic amounts concerning colors, motions, and the like. Notethat in FIG. 11, among the various types of scene information, only thescene titles are shown.

Here, it is assumed that the client terminals are various informationhousehold devices having different terminal capabilities. The terminalcapability refers to a communication speed, a processing speed, an imageformat that can be reproduced/displayed, an image resolution, a userinput function, and the like. For instance, it is assumed that theclient terminal 481 is a personal computer (PC) that has sufficientperformance with respect to the communication speed, processing speed,display performance, and user input function. Also, it is assumed thatthe client terminal 482 is a mobile telephone and the remaining clientterminals are each a PDA or the like. Each of the client terminals 481to 48 n sends information concerning its terminal performance.

The terminal capability judgment unit 43 analyzes the information 51that was sent from each of the client terminals 481 to 48 n and showsthe terminal performance of the client terminal, determines adeliverable image format, a maximum image resolution, a length of thecontent, and the like, and outputs them to the metadata re-generationunit 44. When the original content is video content encoded in MPEG-2and has a high resolution, for instance, the original content can bereproduced by the client terminal 481 as it is because the clientterminal 481 has sufficient performance as described above. Also, it isassumed that this client terminal 481 has a function with which it ispossible to perform the image summary reproduction and searchingdescribed in the third embodiment described above. On the other hand, itis assumed that the client terminal 482 is capable of reproducing onlyshort video shots encoded in MPEG-4 and the maximum resolutiondisplayable by the client terminal 482 is low.

The metadata re-generation unit 44 restructures the content inaccordance with the terminal performance of each of the client terminals481 to 48 n informed by the terminal performance judgment unit 43,re-generates the metadata 52 describing the structure and contents ofthe restructured content, and outputs the metadata 52 to the metadatadelivery unit 45. For instance, the original metadata is delivered tothe client terminal 481 as it is, so that the restructuring of thecontent is not performed. On the other hand, the client terminal 482 hasonly the function of reproducing short video shots and is incapable ofreproducing every scene, so that the restructuring of the content isperformed for the client terminal 482 using short video shots ofimportant scenes.

FIG. 12 shows an example of a content structure after the restructuringby the metadata re-generation unit of the content delivery systemaccording to the fourth embodiment. As shown in FIG. 12, each importantscene, out of scenes of the news video, is extracted and the content isrestructured so as to include only the representative shot orrepresentative frame of each extracted scene. Also, the client terminal482 does not have the search function described in the above thirdembodiment, so that among the various types of scene information in themetadata, the scene characteristic amounts are not required to beincluded for searching. Therefore, the metadata re-generation unit 44re-generates metadata describing only the structure of restructuredscenes and the position information of the representative shots orrepresentative frames of the scenes, and sends the metadata to themetadata delivery unit 45.

The metadata delivery unit 45 delivers the metadata 53 generated by themetadata re-generation unit 44 to the client terminals 481 to 48 n.

Each of the client terminals 481 to 48 n analyzes the metadata 53delivered by the metadata delivery unit 45 and obtains scene structureinformation of the content. When a user of each of the client terminals481 to 48 n selects a scene that he/she wishes to reproduce, the clientterminal transmits position information of the selected scene to thecontent delivery unit 47 of the content delivery server 500.

On receiving the scene position information from each of the clientterminals 481 to 48 n, the content delivery unit 47 of the contentdelivery server 500 obtains corresponding content 55 from the contentaccumulation unit 46 and delivers the content to each of the clientterminals 481 to 48 n. In the case of the client terminal 481, thecontent delivery unit 47 sends a scene start position and a scene endposition and delivers a corresponding scene of the original content. Onthe other hand, in the case of the client terminal 482, the contentdelivery unit 47 sends the location information (such as the URL) of ascene representative shot. Note that when the representative shot is notreproducible/displayable by the client terminal 482 because of its imageformat, image resolution, image file size, or the like, the contentdelivery unit 47 performs various kinds of processing, such as formatconversion, resolution conversion, and reduction in file size throughcontent summarization, and sends resultant data to the client terminal482.

As described above, with the metadata delivery server 400 of the fourthembodiment, it becomes possible to re-generate metadata in accordancewith the capability of each of the client terminals 481 to 48 n and todeliver the re-generated metadata to each of the client terminals 481 to48 n.

It should be noted here that in FIG. 10, the metadata delivery server400 and the content delivery server 500 are shown as separatedapparatuses, but the present invention is not limited to this. Forinstance, the content delivery server may be provided in the metadatadelivery server or the metadata delivery server may be provided in thecontent delivery server. In addition, needless to say, the metadatadelivery server and the content delivery server may be provided in thesame server. In this case, it becomes possible for the terminalcapability judgment unit 43 to easily inform the content delivery unit47 of the capability of each of the client terminals 481 to 48 n, whichmakes it possible to restructure the content through format conversionor the like in accordance with the capability of the client terminal andto deliver the restructured content to each of the client terminals 481to 48 n.

Also, the fourth embodiment has been described by assuming that in themetadata accumulation unit 41, the metadata generated by the metadataediting apparatus of the first embodiment or the second embodimentdescribed above is accumulated. However, the present invention is notlimited to this and, needless to say, metadata generated by an apparatusother than the metadata editing apparatus of the first embodiment or thesecond embodiment described above may be accumulated in the metadataaccumulation unit 41.

FIFTH EMBODIMENT

In this fifth embodiment, another example of the metadata deliveryserver described in the above fourth embodiment will be described. Themetadata delivery server of the above fourth embodiment performs themetadata re-generation based on the terminal information sent from eachclient terminal. In the fifth embodiment, however, in order to moreappropriately perform the metadata re-generation, the metadata deliveryserver (metadata delivery apparatus) is provided with a metadataanalysis/re-generation unit that performs the metadata re-generationusing metadata optimization hint information that is hint informationfor the metadata re-generation.

The metadata delivery server according to the fifth embodiment of thepresent invention will be described with reference to the accompanyingdrawings. FIG. 13 is a block diagram showing a construction of themetadata delivery server according to the fifth embodiment of thepresent invention.

Referring to FIG. 13, a metadata delivery server 400A includes a hintinformation analysis unit 61, metadata analysis/re-generation unit 63,and a metadata delivery unit 45.

The hint information analysis unit 61 analyzes metadata optimizationhint information 60 and outputs a result of the analysis. The metadataanalysis/re-generation unit 63 analyzes metadata 49 describing thestructure and characteristics of content based on analyzed metadataoptimization hint information 62 and a condition 65 concerning metadatare-generation such as information concerning the performances of theclient terminals or user preferences, and outputs restructured metadata64. Then, the metadata delivery unit 45 delivers metadata 53 to theclient terminals.

In the metadata accumulation unit 41 (see FIG. 10), the metadata 49describing the structure and characteristics of the content and themetadata optimization hint information 60 that is hint information forthe re-generation of the metadata 49 are accumulated. Here, the metadataoptimization hint information 60 for the re-generation of the metadata49 is information describing the types of information contained in themetadata 49, the amount of the contained information, and the outlineand complexity of the metadata 49.

Next, how the metadata delivery server according to the fifth embodimentoperates will be described with reference to the accompanying drawings.

The metadata optimization hint information 60 will be described indetail by taking, as an example, a case of video content having thestructure shown in FIG. 14.

A video content (Root) (Soccer game program) is broadly divided into twoscenes (Scene 1 and Scene2) corresponding to the first half and thesecond half, and the first half scene is further divided into multiplescenes (Scene1-1, Scene1-2, . . . , Scene1-n) (such as goal scenes andcorner kick scenes). In FIG. 14, the temporal hierarchical structureamong the scenes is indicated using a tree structure.

The metadata 49 corresponding to the video content describes thetemporal hierarchical structure of the content, that is, the temporalrelations among the scenes, and the start times and lengths of thescenes. The metadata 49 also describes text information (such as atitle, abstract, category, and explanatory notes), importance, and thelike of each scene as well as the characteristics (for instance, a colorhistogram or motion complexity) possessed by the scene in accordancewith the hierarchical level of the scene. Note that in this fifthembodiment, it is assumed that MPEG-7 standardized by ISO is used as ametadata description format.

FIG. 15 shows how the metadata is described in MPEG-7. In MPEG-7, eachscene is described in units called “video segment”. In each videosegment, there are described time information (scene start point andlength), a title, an outline, a category, and the like. Note that thereis a case where the information described in each video segment ischanged in accordance with the hierarchical level of the video segment.In the example shown in FIG. 15, importance is described in each videosegment at Level 2 and Level 3, although no importance is described ineach video segment at Level 4. Also, the characteristic amountsconcerning colors and motions are described only in each video segmentat Level 4.

It is possible to express the temporal hierarchical relations among thescenes by recursively describing the video segments. In the descriptionexample shown in FIG. 15, with a “time division” description, there isdescribed a state where one video segment is composed of multiple videosegments temporally divided. In MPEG-7, it is also possible to describethe spatial hierarchical structure possessed by the content in a likemanner. In this case, instead of the “time division” description, a“space division” description is used to express a state where onesegment is composed of multiple segments spatially divided.

The metadata optimization hint information 60 for the re-generation ofthe metadata 49 describes the types and contents of information(descriptors) contained in the metadata 49. Accordingly, in the case ofthe metadata shown in FIG. 15, the metadata optimization hintinformation 60 contains a descriptor (“time division”) expressing thetemporal hieratical structure possessed by the content, descriptorsexpressing the color histogram and the motion complexity, anddescriptors expressing the title, abstract, category, and importance.Also, in order to express description contents and complexity, the depthof each video segment in the hierarchical structure is expressed with upto four levels (Level 1 to Level 4). Further, the importance assumes oneof five discrete values ({0.0, 0.25, 0.5, 0.75, 1.0}). As importancewith respect to viewpoints, there are described importance from theviewpoint of “Team A” and importance from the viewpoint of “Team B”.Also, there is described the hierarchical position at which theimportance is described (video segment level at which the importance isdescribed).

FIG. 16 shows an example of a format of the metadata optimization hintinformation 60. The metadata optimization hint information 60 shown inFIG. 16 contains metadata file information and metadata constructionelement information.

The metadata file information describes information for predictingresources required to process the metadata, such as the memory sizerequired to accumulate/analyze the metadata and the processing system(S/W) required to analyze the metadata. In more detail, for instance,the metadata file information describes the location of a metadata file,the size of the metadata file, the format of the metadata file (forinstance, the XML format or the binary format), syntax file information(location of a syntax file defining the syntax of the metadata), and anappearing element number showing the number of elements contained(appearing) in the metadata. Note that when the metadata is described inthe XML format, the syntax file defining the format of the metadata filecorresponds to a DTD file, a schema file, or the like defining thedescription format (syntax) of the metadata, and the syntax fileinformation describes the location of the DTD file or the schema file,for instance.

The metadata construction element information is information describingthe type and contents of each descriptor constituting the metadata. Inmore detail, the metadata construction element information contains thename of each descriptor contained in the metadata, the appearingfrequency (number of appearing times) of the descriptor in the metadata,and a description (completeness of description) showing whether or notthe descriptor contains every descriptor that has the possibility ofbeing syntaxically contained. In addition, when the descriptor isrecursively described, the metadata construction element informationalso contains the temporal or spatial hierarchical property (maximumvalue of the depth) possessed by the descriptor. In the case of themetadata description shown in FIG. 15, for instance, “video segment” isthe descriptor recursively described and has a hierarchical structurewith up to four levels, so that the maximum hierarchical depth possessedby the “video segment” descriptor becomes four.

In addition, as to a descriptor contained the descriptor recursivelydescribed, the hint information also describes the appearing position(hierarchical level) at which the contained descriptor appears. Forinstance, “importance” is a descriptor contained in the “video segment”descriptor and, when the “importance” is contained in the video segmentat up to Level 3, that is, is not contained in the video segment atLevel 4, the appearing position of the “importance” becomes up to Level3. In this manner, the appearing position is specified using thehierarchical level. However, when an ID is assigned to each “videosegment” containing the “importance” or the “video segment” itself, itis also possible to describe the appearing position as an ID list. Also,in the case of a descriptor having a value, the hint informationadditionally describes the type of the descriptor and the range ofvalues that the descriptor is assumable to have. When the importance isexpressed using the five discrete values ({0.0, 0.25, 0.5, 0.75, 1.0})with respect to each of the viewpoints of “Team A” and “Team B”, forinstance, the assumable values of the “importance” become a list of{0.0, 0.25, 0.5, 0.75, 1.0} having a floating-point form. The abovedescription is repeated for each descriptor that is a constructionelement of the metadata.

FIG. 17 shows an example of the metadata optimization hint informationdescribed in accordance with the format shown in FIG. 16. It can be seenthat the example of the metadata optimization hint information 60 shownin FIG. 17 contains the metadata file information and the metadataconstruction element information for descriptors such as the “videosegment” descriptor and the “title” descriptor.

Next, a method for performing re-generation of metadata using themetadata optimization hint information 60 will be described withreference to FIG. 13.

The hint information analysis unit 61 performs analysis of the metadataoptimization hint information 60 described in the specified format. Themetadata analysis/re-generation unit 63 performs analysis of themetadata 49 using the analyzed metadata optimization hint information 62outputted from the hint information analysis unit 61, and outputs themetadata 64 re-generated based on the condition 65 concerning themetadata re-generation.

FIG. 18 shows an example of a method with which the metadataanalysis/re-generation unit 63 analyzes the metadata using the analyzedmetadata optimization hint information 62. In this example, it isassumed that only video segment characterized by having importance of0.5 or higher is extracted from the original metadata 49, and metadatacomposed of only a description concerning the extracted video segment isre-generated.

First, the metadata analysis/re-generation unit 63 specifies metadatanecessary for re-generation based on the condition 65 for metadatare-generation (step S1). In this example, only video segmentcharacterized by having importance of 0.5 or higher is extracted, sothat “importance” and “video segment” are descriptors necessary for there-generation.

Next, the analyzed metadata optimization hint information 62 judgeswhether or not the descriptors specified in step S1 are contained in themetadata 49 (step S2) (the following description will be made by takinga case of the “importance” descriptor as an example).

When the “importance” descriptor is contained in the metadata, analysisof the metadata is performed (step 3). On the other hand, when the“importance” descriptor is not contained, the metadata analysisprocessing is ended (step S4).

Also, when the analyzed metadata optimization hint information 62specifies that the appearing position of the “importance” descriptor isup to Level 3 of the hierarchical structure, at the time when theanalysis of the video segments up to Level 3 is finished (step S5), theanalysis processing is ended without performing the analysis for Level 4and the following hierarchical levels (step S6).

It should be noted here that in order to perform the analysis of anotherpiece of metadata 49 if necessary, the operations in step S1 and thefollowing steps are repeated. Also, when the metadata optimization hintinformation 62 specifies that the number of appearing times of the“importance” descriptor is 20, at the time when the analysis of 20“importance” descriptors is finished (step S5), the analysis of themetadata is ended (step S6). After the metadata analysis processing isended in step S4 or step S6, in order to perform the analysis of anotherpiece of metadata if necessary, the operations in step S1 and thefollowing steps are repeated.

FIG. 19 shows another example of the method for analyzing the metadatausing the analyzed metadata optimization hint information 62. In thisexample, it is assumed that metadata is re-generated by extracting onlyvideo segment containing the “title” descriptor. The judgement whetheror not the metadata contains the “title” descriptor is performed in thesame manner as in the example shown in FIG. 18.

When the metadata contains the “title” descriptor, the metadataanalysis/re-generation unit 63 judges whether or not a video segmentmatches an appearing position ID described in the metadata optimizationhint information 62 (step S13).

When the video segment does not match the ID, the video segment does notcontain the “title” descriptor, so that the analysis of the descriptionof this video segment is skipped (step S16).

On the other hand, when the video segment matches the ID, in order toobtain the “title” descriptor, the analysis of the description of thisvideo segment is performed (S15).

Next, when the analysis of every video segment matching the appearingposition ID is finished (step S17), no video segment containing the“title” descriptor remains in the metadata, so that the analysisprocessing is ended (step S18).

It should be noted here that in order to perform the analysis of anotherpiece of metadata if necessary, the operations in step S11 and thefollowing steps are repeated. Then, the metadata 64 restructured usingthe descriptor extracted through the analysis processing described aboveis outputted.

The metadata delivery unit 45 delivers the restructured metadata 64 tothe client terminals.

It should be noted here that although not shown, after the metadatare-generation, the location of the metadata file, the size of themetadata file, the number of elements appearing in the metadata, and theinformation concerning the metadata construction elements are alsochanged. Accordingly, metadata optimization hint informationcorresponding to the metadata after the re-generation may bere-generated.

It has conventionally been required to analyze every descriptorcontained in metadata for metadata re-generation. In the fifthembodiment, however, the descriptor analysis of the metadata 49 isperformed in the manner described above using the metadata optimizationhint information 60 describing the list of each descriptor contained inthe metadata 49, the appearing position of the descriptor, the number ofappearing times, and the like. As a result, it becomes possible to omitthe analysis of the metadata 49 itself for the metadata re-generation.Also, the analysis of each descriptor not matching the re-generationcondition is omitted using the appearing position or the number ofappearing times, so that it becomes possible to reduce the processingcost (such as the processing amount and the memory usage amount)required to perform the metadata analysis and re-generation.

SIXTH EMBODIMENT

In the fifth embodiment described above, there has been described themetadata delivery server that reduces the processing cost required toperform the metadata analysis and re-generation using the metadataoptimization hint information for the metadata re-generation. In thissixth embodiment, however, a metadata search server (metadata searchapparatus) will be described which reduces the processing cost requiredto perform metadata searching using the metadata optimization hintinformation.

The metadata search server according to the sixth embodiment of thepresent invention will be described with reference to the accompanyingdrawings. FIG. 20 is a block diagram showing a construction of themetadata search server according to the sixth embodiment of the presentinvention.

Referring to FIG. 20, a metadata delivery server 600 includes a hintinformation analysis unit 61, metadata analysis unit 71, and a searchunit 73.

The hint information analysis unit 61 is the same as that in the fifthembodiment described above and therefore the description thereof isomitted in this embodiment. The metadata analysis unit 71 performsanalysis of an enormous amount of metadata 49 describing the structureand characteristics of content with efficiency and with a lessprocessing cost using analyzed metadata optimization hint information 62and a search condition 70. The search unit 73 searches for contentmatching the search condition using a result 72 of the analysis of themetadata.

Next, how the metadata search server according to the sixth embodimentoperates will be described with reference to the accompanying drawings.

FIG. 21 is a flowchart showing how the metadata analysis unit of themetadata search server according to the sixth embodiment operates.

The metadata analysis unit 71 performs analysis of at least one piece ofmetadata using the metadata optimization hint information 62corresponding to the metadata. Here, in this example, the analysis ofthe metadata is extraction of each characteristic description necessaryfor the searching from the metadata. When a video segment havingspecific color characteristic amounts is given as the search conditionand each video segment having characteristics close to those of thegiven video segment will be searched for, for instance, it is requiredto extract each video segment having a color characteristic description.In the metadata example shown in FIG. 15, a color characteristicdescription (“color histogram”) is added to each video segment at Level4, so that a description concerning each video segment at Level 4 isextracted.

The metadata analysis unit 71 analyses the search condition 70 andspecifies a descriptor that is valid for the searching (step S21). Here,the search condition may be characteristic amounts described in a formatdefined in MPEG-7. Alternatively, the search condition may be an image,a keyword, or the like. When the characteristic amounts (colorarrangement information, for instance) described in the format definedin the MPEG-7 is given as the search condition, each correspondingdescriptor (color arrangement information) becomes a descriptor that isvalid for the searching. Also, when a keyword is given as the searchcondition, each descriptor in a text form (such as a title, an abstract,or explanatory notes) becomes the descriptor that is valid for thesearching.

Next, by referring to the metadata optimization hint information 62, itis judged whether the selected descriptor is contained in the metadata49 (step S22). When the descriptor for the searching is not contained inthe metadata 49, the analysis processing of the metadata 49 is ended(step S24) and the analysis of another piece of metadata 49 is performedif necessary.

On the other hand, when the selected descriptor is contained in themetadata 49, the analysis of the metadata is performed (step S23). As toa metadata analysis method used in this embodiment, like in the case ofthe fifth embodiment described above, the metadata analysis processingshown in FIG. 18 or 19 is performed with efficiency using the metadataoptimization hint information 62 (steps S25 and S26). As a result of theoperations described above, the metadata analysis unit 71 extracts eachcharacteristic description necessary for the searching.

The search unit 73 searches for content matching the search conditionusing the metadata analysis result (characteristic description necessaryfor the searching) 72 outputted from the metadata analysis unit 71. Inthis example, a description concerning each video segment having a colorcharacteristic description (“color histogram”) is outputted by themetadata analysis unit 71, so that the search unit 73 judgescompatibility with the color characteristic amounts (histogram) given asthe search condition and outputs information (“time information”, forinstance) concerning each video segment, whose judgment result ispositive, as a search result 74.

As described above, in the sixth embodiment, the analysis of themetadata 49 is performed using the metadata optimization hintinformation 60, so that it becomes possible to omit the analysis of themetadata 49 itself for the metadata re-generation. Also, the analysis ofeach descriptor that is not necessary for the searching is omitted basedon the appearing position and the number of appearing times, so that itbecomes possible to reduce the processing cost (such as the processingamount and the memory usage amount) required to perform the metadatasearching.

SEVENTH EMBODIMENT

In the fifth embodiment and the sixth embodiment described above,description has been made for a server side that uses the metadataoptimization hint information. In this seventh embodiment, however, aclient terminal (metadata re-generation condition setting apparatus)will be described which uses the metadata optimization hint information.

The client terminal according to the seventh embodiment of the presentinvention will be described with reference to the accompanying drawings.FIG. 22 is a block diagram showing a construction of the client terminalaccording to the seventh embodiment of the present invention.

Referring to FIG. 22, a client terminal 48A includes a hint informationanalysis unit 80, and a metadata re-generation condition setting unit82.

It should be noted here that FIG. 22 shows only a portion of thefunction of the client terminal 48A that relates to means for setting acondition for metadata re-generation using metadata optimization hintinformation 60.

Next, how the client terminal according to the seventh embodimentoperates will be described with reference to the accompanying drawing.

The hint information analysis unit 80 performs analysis of the metadataoptimization hint information 60 described in a specified format. Thishint information analysis unit 80 is the same as that in the fifthembodiment described above, so that the detailed description thereof isomitted in this embodiment.

Then, the metadata re-generation condition setting unit 82 performssetting of a condition 83 for metadata re-generation based on a result81 of the analysis outputted from the hint information analysis unit 80.Here, the condition setting refers to selection of each descriptor thatis unnecessary for the client terminal 48A from among variousdescriptors contained in the metadata optimization hint information 60,for instance. When the client terminal 48A is not provided with a searchfunction using characteristic amounts, each descriptor expressing thecharacteristic amounts, such as a color histogram or motion complexity,is unnecessary.

As another example of the condition setting, when the complexity of themetadata is increased in accordance with an increase in depth in thehierarchical structure describing relations between scenes of content,the depth in the hierarchical structure processible by the clientterminal is set based on the maximum value of the depth in thehierarchical structure described in the metadata optimization hintinformation 60. In still another example, a viewpoint of a user and athreshold value of scene importance are set based on assumable values ofthe importance described in the metadata optimization hint information60.

When the importance assumes five discrete values ({0.0, 0.25, 0.5, 0.75,1.0}) from each of the viewpoints of “Team A” and “Team B” as describedabove, the condition is set so that only each scene having theimportance of 0.5 or higher from the viewpoint of “Team A” is selected,for instance.

The condition 83 for metadata re-generation set by the metadatare-generation condition setting unit 82 is sent to the metadata deliveryserver. On the metadata delivery server side, the metadata isrestructured based on the metadata re-generation condition and theterminal performance of the client terminal. When the maximum value ofthe depth in the hierarchical structure of the original metadata is fourand the depth in the hierarchical structure processible by the clientterminal is set at two in the metadata re-generation condition, forinstance, the structure of the metadata is restructured so that themaximum value of the depth in the hierarchical structure becomes two.

Also, when the metadata re-generation condition has been set so thatonly each scene having the importance of 0.5 or higher from theviewpoint of “Team A” is selected, metadata composed of only each scenematching the condition is re-generated. As a result, like in the fifthembodiment described above, it becomes possible to perform the metadatare-generation with efficiency using the metadata optimization hintinformation.

As described above, in the seventh embodiment, the metadatare-generation condition is set using the metadata optimization hintinformation 60, so that it becomes possible to generate appropriatemetadata in accordance with the client terminal or application.

EIGHTH EMBODIMENT

In the fifth embodiment and the sixth embodiment described above, therehas been described a server that re-generates metadata using themetadata optimization hint information and delivers the re-generatedmetadata. In this eighth embodiment, however, a content delivery server(content delivery apparatus) will be described which analyzes metadatausing the metadata optimization hint information, re-generates contentsuited for the client terminal or user preferences using a result of theanalysis, and delivers the re-generated content.

The content delivery server according to the eighth embodiment of thepresent invention will be described with reference to the accompanyingdrawings. FIG. 23 is a block diagram showing a construction of thecontent delivery server according to the eighth embodiment of thepresent invention.

Referring to FIG. 23, a content delivery server 500A includes a hintinformation analysis unit 61, a metadata analysis unit and a contentrestructuring/delivery unit 88.

Next, how the content delivery server according to the eighth embodimentoperates will be described with reference to the accompanying drawings.

The hint information analysis unit 61 operates in the same manner as inthe fifth embodiment described above, so that the description thereof isomitted in this embodiment.

The metadata analysis unit 86 performs analysis of metadata 49 usinganalyzed metadata optimization hint information 62 outputted from thehint information analysis unit 61, and extracts each descriptionmatching information concerning the client terminal or a condition 85concerning content restructuring such as user preferences. The analysisusing the hint information is the same as that in the fifth embodimentdescribed above. However, this eighth embodiment differs from the fifthembodiment in that not the metadata re-generation but contentrestructuring is performed using each extracted description. Eachdescription extracted by the metadata analysis unit 86, that is,analyzed metadata 87 is outputted to the content restructuring/deliveryunit 88.

The content restructuring/delivery unit 88 performs restructuring ofcontent 89 based on each description extracted by the metadata analysisunit 86. Here, the following description will be made based on theexample described in the above fifth embodiment. In the fifthembodiment, only each video segment having the characteristics, whoseimportance is 0.5 or higher, is extracted from the metadata 49 andmetadata composed of only each description concerning the extractedvideo segment is re-generated.

In a like manner, in this eighth embodiment, only each video segmenthaving characteristics, whose importance is 0.5 or higher, is extractedfrom the metadata 49 and content 90 composed of only each scenecorresponding to the extracted video segment is restructured and isdelivered. In the description concerning the extracted video segment,the location of corresponding content and the position (timeinformation) of the video segment in the content are described.Therefore, it is possible to clip each corresponding scene from thecontent, to restructure a single content 90 using the clipped scene, andto deliver the restructured content 90. Alternatively, it is possible toclip each corresponding scene from the content and to sequentiallydeliver the clipped scene.

As described above, with the content delivery server 500A according tothe eighth embodiment, the metadata analysis is performed using themetadata optimization hint information 60 describing a list of eachdescriptor contained in the metadata 49, the appearing position of thedescriptor, the number of appearing times, and the like, so that itbecomes possible to omit the analysis of the metadata 49 itself for themetadata re-generation. Also, the analysis of each descriptor notmatching the re-generation condition is omitted using the appearingposition and the number of appearing times, so that it becomes possibleto reduce the processing cost (such as the processing amount and thememory usage amount) required to perform the metadata analysis and thecontent restructuring at the time of re-generation and delivery ofcontent suited for the client terminal and user preferences.

INDUSTRIAL APPLICABILITY

As described above, with the present invention, multimedia contentcontaining moving pictures and audio is divided into multiple scenes,editing of the multiple scenes is performed, and metadata that is scenestructure information describing the hierarchical structure of themultimedia content is generated. As a result, it becomes possible togenerate metadata describing the hierarchical structure possessed bymultimedia content containing video data and the like.

1. A hint information description method comprising: describing, as hintinformation for manipulation of metadata composed of at least onedescriptor describing semantic content, a structure, and characteristicsof content, a name or an identifier of each descriptor contained in themetadata.